Search results for "Minimum description length"

showing 4 items of 4 documents

State transition identification in multivariate time series (STIMTS) applied to rotational jump trajectories from single molecules

2018

Time resolved data from single molecule experiments often suffer from contamination with noise due to a low signal level. Identifying a proper model to describe the data thus requires an approach with sufficient model parameters without misinterpreting the noise as relevant data. Here, we report on a generalized data evaluation process to extract states with piecewise constant signal level from simultaneously recorded multivariate data, typical for multichannel single molecule experiments. The method employs the minimum description length principle to avoid overfitting the data by using an objective function, which is based on a tradeoff between fitting accuracy and model complexity. We val…

0301 basic medicinePhysicsNoise (signal processing)Monte Carlo methodGeneral Physics and AstronomyOverfittingSynthetic data03 medical and health sciencesTime resolved data030104 developmental biologyPiecewiseJumpStatistical physicsPhysical and Theoretical ChemistryMinimum description lengthThe Journal of Chemical Physics
researchProduct

Discovering unbounded unions of regular pattern languages from positive examples

1996

The problem of learning unions of certain pattern languages from positive examples is considered. We restrict to the regular patterns, i.e., patterns where each variable symbol can appear only once, and to the substring patterns, which is a subclass of regular patterns of the type xαy, where x and y are variables and α is a string of constant symbols. We present an algorithm that, given a set of strings, finds a good collection of patterns covering this set. The notion of a ‘good covering’ is defined as the most probable collection of patterns likely to be present in the examples, assuming a simple probabilistic model, or equivalently using the Minimum Description Length (MDL) principle. Ou…

0303 health sciencesComputer scienceString (computer science)0102 computer and information sciences01 natural sciencesSubstringCombinatoricsSet (abstract data type)03 medical and health sciencesVariable (computer science)Cover (topology)010201 computation theory & mathematicsSimple (abstract algebra)Minimum description length030304 developmental biology
researchProduct

Textual data compression in computational biology: Algorithmic techniques

2012

Abstract In a recent review [R. Giancarlo, D. Scaturro, F. Utro, Textual data compression in computational biology: a synopsis, Bioinformatics 25 (2009) 1575–1586] the first systematic organization and presentation of the impact of textual data compression for the analysis of biological data has been given. Its main focus was on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used together with a technical presentation of how well-known notions from information theory have been adapted to successfully work on biological data. Rather surprisingly, the use of data compression is pervasive in computational biology. Starting from…

Biological dataData Compression Theory and Practice Alignment-free sequence comparison Entropy Huffman coding Hidden Markov Models Kolmogorov complexity Lempel–Ziv compressors Minimum Description Length principle Pattern discovery in bioinformatics Reverse engineering of biological networks Sequence alignmentSettore INF/01 - InformaticaGeneral Computer ScienceKolmogorov complexityComputer scienceSearch engine indexingComputational biologyInformation theoryInformation scienceTheoretical Computer ScienceTechnical PresentationEntropy (information theory)Data compressionComputer Science Review
researchProduct

Complexity Selection of the Self-Organizing Map

2002

This paper describes how the complexity of the Self-Organizing Map can be selected using the Minimum Message Length principle. The use of the method in textual data analysis is also demonstrated.

Self-organizing mapComputer scienceSelfWorst-case complexityData miningMinimum description lengthcomputer.software_genrecomputerSelection (genetic algorithm)Minimum message length
researchProduct